Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation
نویسندگان
چکیده
This paper presents a solution to evaluate spoken post-editing of imperfect machine translation output by a human translator. We compare two approaches to the combination of machine translation (MT) and automatic speech recognition (ASR): a heuristic algorithm and a machine learning method. To obtain a data set with spoken post-editing information, we use the French version of TED talks as the source texts submitted to MT, and the spoken English counterparts as their corrections, which are submitted to an ASR system. We experiment with various levels of artificial ASR noise and also with a state-of-the-art ASR system. The results show that the combination of MT with ASR improves over both individual outputs of MT and ASR in terms of BLEU scores, especially when ASR performance is low.
منابع مشابه
Traduire la parole : le cas des TED Talks
The continuous improvement of the quality of machine translation and of speech recognition systems opens new perspectives for the development of spoken translation applications. In this study, based on our own experience with the development of Spoken Translation Systems (STS) for conferences, we analyze and quantify the main difficulties raised by STSs, and discuss possible strategies to migit...
متن کاملThe MSR SYSTEM for IWSLT 2011 evaluation
This paper describes the Microsoft Research (MSR) system for the evaluation campaign of the 2011 international workshop on spoken language translation. The evaluation task is to translate TED talks (www.ted.com). This task presents two unique challenges: First, the underlying topic switches sharply from talk to talk. Therefore, the translation system needs to adapt to the current topic quickly ...
متن کاملThe IWSLT 2016 Evaluation Campaign
The IWSLT 2016 Evaluation Campaign featured two tasks: the translation of talks and the translation of video conference conversations. While the first task extends previously offered tasks with talks from a different source, the second task is completely new. For both tasks, three tracks were organised: automatic speech recognition (ASR), spoken language translation (SLT), and machine translati...
متن کاملApplying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System
Conventional rule-based machine translation system suffers from its weakness of fluency in the view of target language generation. In particular, when translating English spoken language to Korean, the fluency of translation result is as important as adequacy in the aspect of readability and understanding. This problem is more severe in language pairs such as English-Korean. It’s because Englis...
متن کاملReport on the 10th IWSLT Evaluation Campaign
The paper overviews the tenth evaluation campaign organized by the IWSLT workshop. The 2013 evaluation offered multiple tracks on lecture transcription and translation based on the TED Talks corpus. In particular, this year IWSLT included two automatic speech recognition tracks, on English and German, three speech translation tracks, from English to French, English to German, and German to Engl...
متن کامل